385 research outputs found

    Solving non-uniqueness in agglomerative hierarchical clustering using multidendrograms

    Full text link
    In agglomerative hierarchical clustering, pair-group methods suffer from a problem of non-uniqueness when two or more distances between different clusters coincide during the amalgamation process. The traditional approach for solving this drawback has been to take any arbitrary criterion in order to break ties between distances, which results in different hierarchical classifications depending on the criterion followed. In this article we propose a variable-group algorithm that consists in grouping more than two clusters at the same time when ties occur. We give a tree representation for the results of the algorithm, which we call a multidendrogram, as well as a generalization of the Lance and Williams' formula which enables the implementation of the algorithm in a recursive way.Comment: Free Software for Agglomerative Hierarchical Clustering using Multidendrograms available at http://deim.urv.cat/~sgomez/multidendrograms.ph

    Analysis of Agglomerative Clustering

    Full text link
    The diameter kk-clustering problem is the problem of partitioning a finite subset of Rd\mathbb{R}^d into kk subsets called clusters such that the maximum diameter of the clusters is minimized. One early clustering algorithm that computes a hierarchy of approximate solutions to this problem (for all values of kk) is the agglomerative clustering algorithm with the complete linkage strategy. For decades, this algorithm has been widely used by practitioners. However, it is not well studied theoretically. In this paper, we analyze the agglomerative complete linkage clustering algorithm. Assuming that the dimension dd is a constant, we show that for any kk the solution computed by this algorithm is an O(logk)O(\log k)-approximation to the diameter kk-clustering problem. Our analysis does not only hold for the Euclidean distance but for any metric that is based on a norm. Furthermore, we analyze the closely related kk-center and discrete kk-center problem. For the corresponding agglomerative algorithms, we deduce an approximation factor of O(logk)O(\log k) as well.Comment: A preliminary version of this article appeared in Proceedings of the 28th International Symposium on Theoretical Aspects of Computer Science (STACS '11), March 2011, pp. 308-319. This article also appeared in Algorithmica. The final publication is available at http://link.springer.com/article/10.1007/s00453-012-9717-

    Spatio-Temporal Dynamics of Caddisflies in Streams of Southern Western Ghats

    Get PDF
    The dynamics of physico-chemical factors and their effects on caddisfly communities were examined in 29 streams of southern Western Ghats. Monthly samples were collected from the Thadaganachiamman stream of Sirumalai Hills, Tamil Nadu from May 2006 to April 2007. Southwest and northeast monsoons favored the existence of caddisfly population in streams. A total of 20 caddisfly taxa were collected from 29 streams of southern Western Ghats. Hydropsyche (Trichoptera: Hydropsychidae) were more widely distributed throughout sampling sites than were the other taxa. Canonical correspondence analysis showed that elevation was a major variable and pH, stream order, and stream substrates were minor variables affecting taxa richness. These results suggested that habitat heterogeneity and seasonal changes were stronger predictors of caddisfly assemblages than large-scale patterns in landscape diversity

    How Fitch-Margoliash Algorithm can Benefit from Multi Dimensional Scaling

    Get PDF
    Whatever the phylogenetic method, genetic sequences are often described as strings of characters, thus molecular sequences can be viewed as elements of a multi-dimensional space. As a consequence, studying motion in this space (ie, the evolutionary process) must deal with the amazing features of high-dimensional spaces like concentration of measured phenomenon
    corecore